Designing and identifying β-hairpin peptide macrocycles with antibiotic potential

Peptide macrocycles are a rapidly emerging class of therapeutic, yet the design of their structure and activity remains challenging. This is especially true for those with β-hairpin structure due to weak folding properties and a propensity for aggregation. Here, we use proteomic analysis and common antimicrobial features to design a large peptide library with macrocyclic β-hairpin structure. Using an activity-driven high-throughput screen, we identify dozens of peptides killing bacteria through selective membrane disruption and analyze their biochemical features via machine learning. Active peptides contain a unique constrained structure and are highly enriched for cationic charge with arginine in their turn region. Our results provide a synthetic strategy for structured macrocyclic peptide design and discovery while also elucidating characteristics important for β-hairpin antimicrobial peptide activity.


Supplementary Text SynCH peptide synthesis and SLAY activity
For consistency, all SynCH peptides were resuspended at 10 mg/ml in water. Peptides were then visually inspected for any lack of solubility (Data File S1). SLAY identifies plasmids expressing peptide fusions which slow bacterial growth. Many peptides demonstrating solubility issues were also not found to be active in vitro. It is possible these peptides or others slowed bacterial growth in SLAY by aggregating intracellularly before being displayed on the cell surface. To investigate whether soluble SySA peptides may be prone to aggregative qualities, SySA-49 was examined via CD at concentrations ranging from 450 µg/ml to 50 µg/ml ( fig.  S4C). Its molar ellipticity remained relatively constant regardless of increased concentration, suggesting SySA-49 and other soluble peptides are likely not prone to strong aggregation in solution.
SynCH peptide antibiotics can be optimized for therapeutic potential We questioned if the naïve peptide sequences discovered through our screen could be optimized to improve their activity. Results from our biochemical characterization suggest high charge is important. Previous data from optimization of another SLAY identified β-AMP found that shorter peptide length and additional disulfide bonds increased its potency (24). For these reasons we generated a 27-peptide optimization library around our most potent peptide (SySA-5) by shortening its length, increasing its charge, and adding the potential for a second intramolecular disulfide bond while also maintaining alternating residue side chain properties in the antiparallel β-sheets (table S3).
We had this library commercially synthesized and tested its ability to inhibit the growth of Acinetobacter baumannii AB5075, a clinically isolated Carbapenem Resistant Acinetobacter (CRA) pathogen. MICs were performed in MH as well as 100% FBS, which is likely more representative of the in vivo environment. We found that 23 or our 27 variants increased as much as 8-fold in potency in MH. Additionally, eleven gained the ability to inhibit A. baumanii AB5075 growth in FBS (table S3). This was surprising as degradation in serum is a common limitation of peptide use. Curious whether increased potency correlated with increased toxicity we also measured each variants hemolysis at 128 µg/ml and reported it as a fold change relative to SySA-5 (table S3). There was not an equivalent increase in hemolysis for any of the variants. Most had less than a two-fold increase relative to SySA-5. The most potent of the variants (SySA 5.17) had the same MIC against A. baumanii AB5075 in FBS as Protegrin-1 (32 µg/ml), but was greater than ten-fold less hemolytic at 128 µg/ml. This data suggests the SynCH peptides can be easily optimized to have greater therapeutic potential than naturally occurring β-AMPs.

Machine learning algorithm and modelling
For this model, MIC values obtained for the top 81 SySA peptides were log2 transformed and amino acid sequences were embedded as numerical vectors using the Bepler deep protein language model (35) from the bio-embeddings python library (38). We then used AutoML (36) to fit an array of different predictors to 80% of the SySA biochemical data (the training data) and validated performance on the remaining 20% (validation data). Each fitted predictor takes a numerical embedding of a peptide as input and produces a score as output, where a lower score indicates a higher likelihood of antibacterial activity. The AutoML algorithm considered predictors such as linear, random forest, neural networks, or gradient boosting.
The best performing predictor found by AutoML was of type LightGMB (37), with 16 selected features. Because the features are embedding scores, we could not interpret them directly. To nonetheless gain some insight into these features, we correlated them with basic properties of peptides such as the proportion of different amino acids, the peptide length, charge, hydrophobicity, and so on. We found that three features perfectly represented the fractions of alanine, proline, and glutamine in a peptide. These and other features did not have a straightforward interpretation or obvious correlations with our library design. Correlations with peptide length, weight, or the amount of cysteine were also common; however, these correlations had relatively low R 2 values of between 10% and 38%. These observations highlight that the protein embedding scores capture non-trivial aspects of peptide biochemistry that don't necessarily correspond to simple and straightforward biochemical quantities (Fig. 4A).     Circular dichroism spectra of the inactive (A) and active (B) top 88 SySA peptides. C) SySA-49 molar ellipticity spectra at the listed concentrations. Spectra with a single molar ellipticity minimum between 215-220 are considered to contain a β-hairpin secondary structure. Each spectrum is the average of three measurements of the same sample with background removed.  Data S1. (separate file) Excel spreadsheet with chemically synthesized peptide properties and biochemical data.

Data S2. (separate file)
Excel spreadsheet with all SynCH SLAY data analysis and potency score predictions